The R markdown is available from the pulldown menu for Code at the upper-right, choose “Download Rmd”, or download the Rmd from GitHub.
This protocol describes a network analysis workflow in Cytoscape for a set of differentially expressed genes. Points covered:
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
if(!"RCy3" %in% installed.packages())
BiocManager::install("RCy3")
library(RCy3)
First, launch Cytoscape and keep it running whenever using RCy3. Confirm that you have everything installed and running:
cytoscapePing()
cytoscapeVersionInfo()
If you haven’t already, install the STRINGapp
installApp('stringApp')
installApp('yfileslayoutalgorithms')
Ovarian serous cystadenocarcinoma is a type of epithelial ovarian cancer which accounts for ~90% of all ovarian cancers. The data used in this protocol are from The Cancer Genome Atlas, in which multiple subtypes of serous cystadenocarcinoma were identified and characterized by mRNA expression.
We will focus on the differential gene expression between two subtypes, Mesenchymal and Immunoreactive.
For convenience, the data have already been analyzed and pre-filtered, using log fold change value and adjusted p-value.
Many public databases and multiple Cytoscape apps allow you to retrieve a network or pathway relevant to your data. For this workflow, we will use the STRING app. Some other options include:
To identify a relevant network, we will query the STRING database in two different ways:
df <- read.table("https://cytoscape.github.io/cytoscape-tutorials/protocols/data/TCGA-Ovarian-MesenvsImmuno_UP.txt")
string.cmd = paste('string protein query query="', paste(df$V1, collapse = '\n'), '" cutoff=0.4 species="Homo sapiens"', sep = "")
commandsRun(string.cmd)
The resulting network contains up-regulated genes recognized by STRING, and interactions between them with an evidence score of 0.4 or greater.
getTableColumnNames('edge')
evidence.score <- getTableColumns('edge', "stringdb::score")
min(evidence.score)
Next, we are going to perform enrichment anlaysis uing the STRING app. Note that there are several other options, including:
The STRING app has built-in enrichment analysis functionality, which includes enrichment for GO Process, GO Component, GO Function, InterPro, KEGG Pathways, and PFAM.
The STRING app includes several options for filtering and displaying the enrichment results. The features are all available at the top of the STRING Enrichment tab.
Repeat the network search, enrichment analysis and visualization for the set of down-regulated genes:
df <- read.table("https://cytoscape.github.io/cytoscape-tutorials/protocols/data/TCGA-Ovarian-MesenvsImmuno_DOWN.txt")
string.cmd = paste('string protein query query="', paste(df$V1, collapse = '\n'), '" cutoff=0.4 species="Homo sapiens"', sep = "")
commandsRun(string.cmd)
Pro-tip: If you remove the Fill Color mapping from the Style Panel (right-click > Edit > Remove…), set the default to light gray, change the split donut to a Pie Chart in Settings and then try out some of the layouts (see Layout menu), you can end up with network views like this…
Now, we will query the STRING disease database to retrieve a network of ovarian cancer associated genes, completely independent of our dataset.
string.cmd = 'string disease query disease="ovarian cancer"'
commandsRun(string.cmd)
This will bring in the top 100 ovarian cancer associated genes connected with a confidence score greater than 0.4. (We did not give them as parameters[100 and 0.4]. This is because the default values are those.)
Next we will import log fold changes and p-values from our TCGA dataset and use them to create a visualization. Since the network and data use different identifiers, we first have to do some quick identifier mapping. In this case, we will use the gene symbol in the display name column to retrieve Entrez Gene identifiers.
getTableColumnNames('node')
mapped.cols <- mapTableColumn("display name",'Human','HGNC','Entrez Gene')
Here we set Human as species, HGNC as Map from, and Entrez Gene as To.
head(mapped.cols)
mapped.cols displays a report of how many identifiers were mapped. Make note of this information as it impacts all down-stream analysis; If the mapping was unsuccessful, downstream analysis will be as well. You will notice the new column Entrez Gene in the Node Table.tail(getTableColumnNames('node'))
df <- read.csv("https://cytoscape.org/cytoscape-tutorials/protocols/data/TCGA-Ovarian-MesenvsImmuno_data.csv")
head(df)
And integrate the data with the network (node) table in Cytoscape.
- Key Column for Network should be Entrez Gene.
- Gene should be the key of the data(df).
loadTableData(df, data.key.column = "Gene", table = "node", table.key.column = "Entrez Gene")
You will notice two new columns (logFC and FDR.adjusted.Pvalue) in the Node Table.
tail(getTableColumnNames('node'))
We can now use the integrated data to create a network visualization.
logFC.table <- getTableColumns('node', "logFC")
logFC.min <- min(logFC.table, na.rm = TRUE)
logFC.max <- max(logFC.table, na.rm = TRUE)
print(logFC.min)
print(logFC.max)
getVisualStyleNames()
setVisualStyle("BioPAX")
setNodeFontSizeDefault(4, style.name = "BioPAX")
setNodeColorMapping("logFC", c(-logratio.max, 0, logratio.max), c('#0000FF', '#FFFFFF', '#FF0000'), style.name = "BioPAX")
setNodeColorDefault('#D3D3D3', style.name = "BioPAX")
Pro-tip: If you apply the yFiles Organic layout, you can end up with network views like this… (yFiles Layout Algorithms App does not support any automation. Please select it in Cytoscape Desktop menubar.)
The TCGA found several genes that were commonly mutated in ovarian cancer, so called “cancer drivers”. We can add information about these genes to the network visualization, by changing the visual style of these nodes. Three of the most important drivers are TP53, BRCA1 and BRCA2. We will add a thicker, clored border for these genes in the network.
selectNodes(c("TP53", "BRCA1", "BRCA2"), by.col = "display name")
setNodeBorderWidthBypass(getSelectedNodes(), 5)
setNodeBorderColorBypass(getSelectedNodes(), '#FF007F')
The network will now look like this:
Cytoscape provides a number of ways to export results and visualizations:
exportImage('./differentially-expressed-genes', 'PDF')
exportImage('./differentially-expressed-genes', 'PNG')
exportImage('./differentially-expressed-genes', 'JPEG')
exportImage('./differentially-expressed-genes', 'SVG')
exportImage('./differentially-expressed-genes', 'PS')
exportNetworkToNDEx("user", "password", TRUE)
exportNetwork('./differentially-expressed-genes', 'cyjs')